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ABSTRACT 

Recent  improvements  in  positioning  technology  make  mas¬ 
sive  moving  object  data  widely  available.  One  important 
analysis  is  to  find  the  moving  objects  that  travel  together. 
Existing  methods  put  a  strong  constraint  in  defining  moving 
object  cluster,  that  they  require  the  moving  objects  to  stick 
together  for  consecutive  timestamps.  Our  key  observation 
is  that  the  moving  objects  in  a  cluster  may  actually  diverge 
temporarily  and  congregate  at  certain  timestamps. 

Motivated  by  this,  we  propose  the  concept  of  swarm  which 
captures  the  moving  objects  that  move  within  arbitrary  shape 
of  clusters  for  certain  timestamps  that  are  possibly  non- 
consecutive.  The  goal  of  our  paper  is  to  find  all  discrim¬ 
inative  swarms,  namely  closed  swarm.  While  the  search 
space  for  closed  swarms  is  prohibitively  huge,  we  design  a 
method,  ObjectGrowth,  to  efficiently  retrieve  the  answer. 
In  ObjectGrowth,  two  effective  pruning  strategies  are  pro¬ 
posed  to  greatly  reduce  the  search  space  and  a  novel  closure 
checking  rule  is  developed  to  report  closed  swarms  on-the- 
fly.  Empirical  studies  on  the  real  data  as  well  as  large  syn¬ 
thetic  data  demonstrate  the  effectiveness  and  efficiency  of 
our  methods. 

1.  INTRODUCTION 

Telemetry  attached  on  wildlife,  GPS  set  on  cars,  and  mo¬ 
bile  phones  carried  by  people  have  enabled  tracking  of  al¬ 
most  any  kind  of  moving  objects.  Positioning  technologies 
make  it  possible  to  accumulate  a  large  amount  of  moving 
object  data.  Hence,  analysis  on  such  data  to  find  interest¬ 
ing  movement  patterns  draws  increasing  attention  in  animal 
studies,  traffic  analysis,  and  law  enforcement  applications. 

A  useful  data  analysis  task  in  movement  is  to  find  moving 
object  clusters,  which  is  a  loosely  defined  and  general  task 
to  find  a  group  of  moving  objects  that  are  traveling  together 
sporadically.  The  discovery  of  such  clusters  has  been  facili¬ 
tating  in-depth  study  of  animal  behaviors,  routes  planning, 
and  vehicle  control.  A  moving  object  cluster  can  be  defined 
in  both  spatial  and  temporal  dimensions:  (1)  a  group  of 
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moving  objects  should  be  geometrically  close  to  each  other, 
and  (2)  they  should  be  together  for  at  least  some  minimum 
time  duration. 


Figure  1:  Loss  of  interesting  moving  object  clusters 
in  the  definition  of  moving  cluster,  flock  and  convoy. 

There  have  been  many  recent  studies  on  mining  moving 
object  clusters.  One  line  of  study  is  to  find  moving  object 
clusters  including  moving  clusters  [14],  flocks  [10,  9,  4],  and 
convoys  [13,  12].  The  common  part  of  such  patterns  is  that 
they  require  the  group  of  moving  objects  to  be  together  for 
at  least  k  consecutive  timestamps,  which  might  not  be  prac¬ 
tical  in  the  real  cases.  For  example,  if  we  set  k  —  3  in 
Figure  1,  no  moving  object  cluster  can  be  found.  But  intu¬ 
itively,  these  four  objects  travel  together  even  though  some 
objects  temporarily  leave  the  cluster  at  some  snapshots.  If 
we  relax  the  consecutive  time  constraint  and  still  set  k  =  3, 
oi,  03  and  04  actually  form  a  moving  object  cluster.  In  other 
words,  enforcing  the  consecutive  time  constraint  may  result 
in  the  loss  of  interesting  moving  object  clusters. 


Figure  2:  Loss  of  interesting  moving  object  clusters 
in  trajectory  clustering. 

Another  line  of  study  of  moving  object  clustering  is  tra¬ 
jectory  clustering  [20,  6,  8,  17],  which  puts  emphasis  on  ge¬ 
ometric  or  spatial  closeness  of  object  trajectories.  However, 
objects  that  are  essentially  moving  together  may  not  share 
similar  geometric  trajectories.  As  illustrated  in  Figure  2, 
from  the  geometric  point  of  view,  these  two  trajectories  may 
be  rather  different.  But  if  we  pick  the  timestamps  when  they 
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are  close,  such  as  £1,  £3,  £5  and  £9,  the  two  objects  should  be 
considered  as  traveling  together.  In  real  life,  there  are  of¬ 
ten  cases  that  a  set  of  moving  objects  (e.g.,  birds,  flies,  and 
mammals)  hardly  stick  together  all  the  time — they  do  travel 
together,  but  only  gather  together  at  some  timestamps. 

In  this  paper,  we  propose  a  new  movement  pattern,  called 
swarm,  which  is  a  more  general  type  of  moving  object  clus¬ 
ters.  More  precisely,  swarm  is  a  group  of  moving  objects 
containing  at  least  min0  individuals  who  are  in  the  same 
cluster  for  at  least  mint  timestamp  snapshots.  If  we  de¬ 
note  this  group  of  moving  objects  as  O  and  the  set  of  these 
timestamps  as  T,  a  swarm  is  a  pair  (O,  T)  that  satisfies  the 
above  constraints.  Specially,  the  timestamps  in  T  are  not  re¬ 
quired  to  be  consecutive,  the  detailed  geometric  trajectory  of 
each  object  becomes  unimportant,  and  clustering  methods 
and/or  measures  can  be  flexible  and  application-dependent 
(e.g.,  density-based  clustering  vs.  Euclidean  distance- based 
clustering).  By  definition,  if  we  set  min0  =  2  and  mint  =  3, 
we  can  find  swarm  ({01,03,04},  {£1  ,£3,14})  in  Figure  1  and 
swarm  ({01, 02},  {£1,  £3,  £5,  £9})  in  Figure  2.  Such  swarms 
discovered  are  interesting  but  cannot  be  captured  by  pre¬ 
vious  moving  object  cluster  detection  or  (geometry-based) 
trajectory  clustering  methods.  To  avoid  finding  redundant 
swarms,  we  further  propose  the  closed  swarm  concept  (see 
Section  3).  The  basic  idea  is  that  if  ( 0,T )  is  a  swarm,  it  is 
unnecessary  to  output  any  subset  O'  C  O  and  T'  C  T  even 
if  ( O' ,T ')  may  also  satisfy  swarm  requirements.  For  exam¬ 
ple,  in  Figure  2,  swarm  {01,02}  at  timestamps  {£i,£3,£g} 
is  actually  redundant  even  though  it  satisfies  swarm  defini¬ 
tion  because  there  is  a  closed  swarm:  {01, 02}  at  timestamps 
{£1,  £3,  £5,  £9}- 

Efficient  discovery  of  complete  set  of  closed  swarms  in  a 
large  moving  object  database  is  a  non-trivial  task.  First, 
the  size  of  all  the  possible  combinations  is  exponential  (i.e., 
2\°db\  x  2l0r’sl)  whereas  the  discovery  of  moving  clusters, 
flocks  or  convoys  has  polynomial  solution  due  to  stronger 
constraint  posed  by  their  definitions  based  on  k  consecutive 
timestamps.  Second,  although  the  problem  is  defined  using 
the  similar  form  of  frequent  pattern  mining  [1,  11],  none  of 
previous  work  [1,  11,  25,  23,  19,  21]  solves  exactly  the  same 
problem  as  finding  swarms.  Because  in  the  typical  frequent 
pattern  mining  problem,  the  input  is  a  set  of  transactions 
and  each  transaction  contains  a  set  of  items.  However,  the 
input  of  our  problem  is  a  sequence  of  timestamps  and  there 
is  a  collection  of  (overlapping)  clusters  at  each  timestamp 
(detailed  in  Section  3).  Thus,  the  discovery  of  swarms  poses 
a  new  problem  that  needs  to  be  solved  by  specifically  de¬ 
signed  techniques. 

Facing  the  huge  potential  search  space,  we  propose  an  effi¬ 
cient  method,  ObjectGrowth.  In  ObjectGrowth,  besides  the 
Apriori  Pruning  rule  which  is  commonly  used,  we  design  a 
novel  Backward  Pruning  rule  which  uses  a  simple  checking 
step  to  stop  unnecessary  further  search.  Such  pruning  rule 
could  cover  several  redundant  cases  at  the  same  time.  After 
our  pruning  rules  cut  a  great  portion  of  unpromising  candi¬ 
dates,  the  leftover  number  of  candidate  closed  swarms  could 
still  be  large.  To  avoid  the  time-consuming  pairwise  clo¬ 
sure  checking  in  the  post-processing  step,  we  present  a  nice 
Forward  Closure  Checking  step  that  can  report  the  closed 
swarms  on-the-fly.  Using  this  checking  rule,  no  space  is 
needed  to  store  candidates  and  no  extra  time  is  spent  on 
post-processing  to  check  closure  property. 

In  summary,  the  contributions  of  the  paper  are  as  follows. 


•  A  new  concept,  swarm,  and  its  associated  concept 
closed  swarm  are  introduced,  which  enable  us  to  find 
relaxed  temporal  moving  object  clusters  in  the  real 
world  settings. 

•  ObjectGrowth  is  developed  for  efficient  mining  closed 
swarms.  Two  pruning  rules  are  developed  to  efficiently 
reduce  search  space  and  a  closure  checking  step  is  in¬ 
tegrated  in  the  search  process  to  output  closed  swarms 
immediately. 

•  The  effectiveness  as  well  as  efficiency  of  our  methods 
are  demonstrated  on  both  real  and  synthetic  moving 
object  databases. 

The  remaining  of  the  paper  is  organized  as  follows.  Sec¬ 
tion  2  discusses  the  related  work.  The  definitions  of  swarms 
and  closed  swarms  are  given  in  Section  3.  We  introduce  the 
ObjectGrowth  methods  in  Sections  4.  Experiments  testing 
effectiveness  and  efficiency  are  shown  in  Section  5.  Finally, 
our  study  is  concluded  in  Section  6.  The  proofs,  pseudo 
code,  and  detailed  discussions  are  stated  in  Appendix. 

2.  RELATED  WORK 

Related  work  on  moving  object  clustering  can  be  catego¬ 
rized  into  two  lines  of  research:  moving  object  cluster  dis¬ 
covery  and  trajectory  clustering.  The  former  focuses  on  in¬ 
dividual  moving  objects  and  tries  to  find  clusters  of  objects 
with  similar  moving  patterns  or  behaviors;  whereas  the  lat¬ 
ter  is  more  from  a  geometric  view  to  cluster  trajectories. 
The  related  work,  especially  the  ones  for  direct  comparison, 
will  be  described  in  more  details  in  Appendix  C. 

Flock  is  first  introduced  in  [16]  and  further  studied  in  [10, 
9,  2].  Flock  is  defined  as  a  group  of  moving  objects  moving  in 
a  disc  of  a  fixed  size  for  k  consecutive  timestamps.  Another 
similar  definition,  moving  cluster  [14],  tries  to  find  a  group  of 
moving  objects  which  have  considerably  portion  of  overlap 
at  any  two  consecutive  timestamps.  A  recent  study  by  Jeung 
et  a 1.  [13,  12]  propose  convoy ,  an  extension  of  flock,  where 
spatial  clustering  is  based  on  density.  Comparing  with  all 
these  definitions,  swarm  is  a  more  general  one  that  does  not 
require  k  consecutive  timestamps. 

Group  pattern,  defined  in  [22] ,  is  the  most  similar  pattern 
to  swarm  pattern.  Group  patterns  are  the  moving  objects 
that  travel  within  a  radius  for  certain  timestamps  that  are 
possibly  non-consecutive.  Even  though  it  considers  relax¬ 
ation  of  the  time  constraint,  the  group  pattern  definition 
restricts  the  size  and  shape  of  moving  object  clusters  by 
specifying  the  disk  radius.  Moreover,  redundant  group  pat¬ 
terns  make  the  algorithm  exponentially  inefficient. 

Another  line  of  research  is  to  find  trajectory  clusters  which 
reveal  the  common  paths  for  a  group  of  moving  objects.  The 
first  and  most  difficult  challenge  for  trajectory  clustering  is 
to  give  a  good  definition  of  similarity  between  two  trajecto¬ 
ries.  Many  methods  have  been  proposed,  such  as  Dynamic 
Time  Warping  (DTW)  [24],  Longest  Common  Subsequences 
(LCSS)  [20],  Edit  Distance  on  Real  Sequence  (EFR)  [6],  and 
Edit  distance  with  Real  Penalty  (ERP)  [5].  Gaffney  et  a 1.  [8] 
propose  trajectory  clustering  methods  based  on  probabilis¬ 
tic  modeling  of  a  set  of  trajectories.  As  pointed  out  in  Lee  et 
a 1.  [17],  distance  measure  established  on  whole  trajectories 
may  miss  interesting  common  paths  in  sub-trajectories.  To 
find  clusters  based  on  sub-trajectories,  Lee  et  a  1.  [17]  pro¬ 
posed  a  partition-and-group  framework.  But  this  framework 
cannot  find  swarms  because  the  real  trajectories  of  the  ob- 


jects  in  a  swarm  may  be  complicated  and  different.  Works 
on  subspace  clustering  [15,  3]  can  be  also  applied  to  find 
sub-trajectory  clusters.  However,  these  works  address  the 
issue  how  to  efficiently  apply  DBSCAN  on  high-dimensional 
space.  Such  clustering  technique  still  cannot  be  directly  ap¬ 
plied  to  find  swarm  patterns. 

3.  PROBLEM  DEFINITION 

Let  Odb  =  {oi,  02,  • . . ,  On}  be  the  set  of  all  moving  ob¬ 
jects  and  Tdb  =  {£i,  £2, . . . ,  £m}  be  the  set  of  all  timestamps 
in  the  database.  A  subset  of  Odb  is  called  an  objectset  O. 
A  subset  of  Tdb  is  called  a  timeset  T.  The  size ,  |0|  and 
|Tj,  is  the  number  of  objects  and  timestamps  in  O  and  T 
respectively. 

Database  of  clusters.  A  database  of  clusters,  Cdb  = 
{ Ct j,  Ct2,  Ctm},  is  the  collection  of  snapshots  of  the 
moving  object  clusters  at  timestamps  {£i,£2,  •  •  • ,  tm}-  We 
use  Ct^Oj)  to  denote  the  set  of  clusters  that  object  Oj  is  in 
at  timestamp  ti.  Note  that  an  object  could  belong  to  several 
clusters  at  one  timestamp.  In  addition,  for  a  given  objectset 
O,  we  write  Cti  (O)  =  flo-eo  f°r  short.  To  make 

our  framework  more  general,  we  take  clustering  as  a  pre¬ 
processing  step.  The  clustering  methods  could  be  different 
based  on  various  scenarios.  We  leave  the  details  of  this  step 
in  Appendix  D.l. 

Swarm  and  Closed  Swarm.  A  pair  (O,  T)  is  said  to  be 
a  swarm  if  all  objects  in  O  are  in  the  same  cluster  at  any 
timestamp  in  T.  Specifically,  given  two  minimum  thresh¬ 
olds  min0  and  mint,  for  ( 0,T )  to  be  a  swarm,  where  O  = 
{oq ,  o»2, . . . ,  Oip}  C  Odb  and  T  C  Tdb,  it  needs  to  satisfy 
three  requirements: 

(1)  |0|  >  min0-  There  should  be  at  least  mina  objects. 

(2)  |T|  >  mint'-  Objects  in  O  are  in  the  same  cluster  for 
at  least  mint  timestamps. 

(3)  Cti  (oq )  n  Cti  (0i2 )  n  ■  ■  ■  n  Cti  (oip )  ^  0  for  any  U  €  T: 
there  is  at  least  one  cluster  containing  all  the  objects  in  O 
at  each  timestamp  in  T. 

To  avoid  mining  redundant  swarms,  we  further  give  the 
definition  of  closed  swarm.  A  swarm  (O,  T)  is  object-closed  if 
fixing  T,  O  cannot  be  enlarged  ($0'  s.t.  ( Or,T )  is  a  swarm 
and  O  C  O').  Similarly,  a  swarm  (0,T)  is  time-closed  if 
fixing  O,  T  cannot  be  enlarged  ($T'  s.t.  ( 0,T ')  is  a  swarm 
and  T  C  T').  Finally,  a  swarm  ( 0,T )  is  a  closed  swarm  iff 
it  is  both  object-closed  and  time-closed.  Our  goal  is  to  find 
the  complete  set  of  closed  swarms. 

We  use  the  following  example  as  a  running  example  in 
the  remaining  sections  to  give  an  intuitive  explanation  of  our 
methods.  We  set  min0  =  2  and  mint  =  2  in  this  example. 


Figure  3:  Snapshots  of  object  clusters  at  ti  to  £4. 


Example  1.  (Running  Example)  Figure  3  shows  the  in¬ 
put  of  our  running  example.  There  are  4  objects  and  4  times¬ 
tamps  (Odb  =  {oi,  02,  o3,  04},  Tdb  =  {£1,  £2,  £3,  £4}}-  Each 
sub-figure  is  a  snapshot  of  object  clusters  at  each  timestamp. 
It  is  easy  to  see  that  01,  02,  and  04  travel  together  for  most 


of  the  time,  and  02  and  o 4  form  an  even  more  stable  swarm 
since  they  are  close  to  each  other  in  the  whole  time  span. 
Given  min0  =  2  and  mint  =  2,  there  are  totally  15  swarms: 
({01,  o2},  {ti,t2}),  ({01 ,  o4},  {£1,  £2}),  ({02,04},  {ti,t3,U}), 
and  so  on. 

But  it  is  obviously  redundant  to  output  swarms  like  ({02, 04}, 
{£1,12})  and  ({02,04},  {£2,13,14})  (not  time-closed)  since 
both  of  them  can  be  enlarged  to  form  another  swarm:  ({02, 04}, 
{£1 , £2 , £3 , £4}) -  Similarly,  ({oi,o2},  {£1,12,14})  and  ({02,04}, 
{£1,12,14})  are  redundant  (not  object-closed)  since  both  of 
them  can  be  enlarged  as  ({01,02,04},  {£1,12,14}).  There  are 
only  two  closed  swarms  in  this  example:  ({02,  04},  {£1,  £2,  £3,  £4}) 
and  ({01,02,04},  {£1,12,14}). 


Transaction  1 

{ a,b,c } 

Transaction  2 

{a,  c} 

Transaction  3 

{a,c,  d} 

Transaction  4 

{b,d} 

tl 

{{01, 02,  o4},  {o3}} 

t2 

{{01, o3}, {01, 02, o4}} 

£3 

{{01},  {o2,  o3,  o4}} 

£4 

{{03},  {01, 02,  o4}} 

(a)  FP  mining  problem  (b)  Swarm  mining  problem 

Figure  4:  Difference  between  frequent  pattern  min¬ 
ing  and  swarm  pattern  mining 

Note  that  even  though  our  problem  is  defined  in  the  sim¬ 
ilar  form  of  frequent  pattern  mining  [11],  none  of  previous 
work  in  frequent  pattern  (FP)  mining  area  can  solve  exactly 
our  problem.  As  shown  in  Figure  4,  FP  mining  problem 
takes  transactions  as  input,  swarms  discovery  takes  clusters 
at  each  timestamp  as  input.  If  we  treat  each  timestamp  as 
one  transaction,  each  “transaction”  is  a  collection  of  “item- 
sets”  rather  than  just  one  itemset.  If  we  treat  each  cluster  as 
one  transaction,  the  support  measure  might  be  incorrectly 
counted.  For  example,  if  we  do  so  for  the  example  in  Fig¬ 
ure  4,  the  support  of  01  is  wrongly  counted  as  5  because  it 
is  counted  twice  at  £2.  Therefore,  there  is  no  trivial  trans¬ 
formation  of  FP  mining  problem  to  swarm  mining  problem. 
The  difference  demands  new  techniques  to  specifically  solve 
our  problem. 

4.  DISCOVERING  CLOSED  SWARMS 

The  pattern  we  are  interested  in  here,  swarm,  is  a  pair 
( O ,  T)  of  objectset  O  and  timeset  T.  At  the  first  glance,  the 
number  of  different  swarms  could  be  (2^ °DB I  x  2^Tdb^),  i.e., 
the  size  of  the  search  space.  However,  for  a  closed  swarm, 
the  following  Lemma  shows  that  if  the  objectset  is  given,  the 
corresponding  maximal  timeset  can  be  uniquely  determined. 

Lemma  1.  For  any  swarm  ( 0,T ),  0^=0,  there  is  a 
unique  time-closed  swarm  ( 0,T ')  s.t.  T  CT'. 

Its  proof  can  be  found  in  Appendix  B.  In  the  running  ex¬ 
ample,  if  we  set  the  objectset  as  {01,02},  its  maximal  corre¬ 
sponding  timeset  is  {£1,  £2,  £4}-  Thus,  we  only  need  to  search 
all  subsets  of  Odb-  An  alternative  search  direction  based 
on  timeset  is  discussed  in  Appendix  E.l.  In  this  way,  the 
search  space  shrinks  from  (2^°DB^  x  2'[Tdb^)  to  2^°DB) 

Basic  idea  of  our  algorithm.  From  the  analysis  above  we 
see  that,  to  find  closed  swarms,  it  suffices  to  only  search  all 
the  subsets  O  of  moving  objects  Odb-  For  the  search  space 
of  Odb,  we  perform  depth-first  search  of  all  subsets  of  Odb, 
which  is  illustrated  as  pre-order  tree  traversal  in  Figure  5: 


tree  nodes  are  labeled  with  numbers,  denoting  the  depth- 
first  search  order  (nodes  without  numbers  are  pruned). 

Even  though,  the  search  space  is  still  huge  for  enumer¬ 
ating  the  objectsets  in  Odb  {2^°db^).  So  efficient  pruning 
rules  are  demanding  to  speed  up  the  search  process.  We  de¬ 
sign  two  efficient  pruning  rules  to  further  shrink  the  search 
space.  The  first  pruning  rule,  called  Apriori  Pruning,  is  to 
stop  traversing  the  subtree  when  we  find  further  traversal 
cannot  satisfy  mint ■  The  second  pruning  rule,  called  Back¬ 
ward  Pruning ,  is  to  make  use  of  the  closure  property.  It 
checks  whether  there  is  a  superset  of  the  current  objectset, 
which  has  the  same  maximal  corresponding  timeset  as  that 
of  the  current  one.  If  so,  the  traversal  of  the  subtree  un¬ 
der  the  current  objectset  is  meaningless.  In  previous  stud¬ 
ies  [19,  25,  21]  on  closed  frequent  pattern  mining,  there  are 
three  pruning  rules  (i.e.,  item-merging,  sub-itemset  pruning, 
and  item  skipping)  to  cover  different  redundant  search  cases 
(the  details  of  these  techniques  are  stated  in  Appendix  C.3). 
We  simply  use  one  pruning  rule  to  cover  all  these  cases  and 
we  will  prove  that  we  only  need  to  examine  each  superset 
with  one  more  object  of  the  current  objectset.  Armed  with 
these  two  pruning  rules,  the  size  of  the  search  space  can  be 
significantly  reduced. 

After  pruning  the  invalid  candidates,  the  remaining  ones 
may  or  may  not  be  closed  swarms.  A  brute-force  solution  is 
to  check  every  pair  of  the  candidates  to  see  if  one  makes  the 
other  violate  the  closed  swarm  definition.  But  the  time  spent 
on  this  post-processing  step  is  the  square  of  the  number  of 
candidates,  which  is  costly.  Our  proposal,  Forward  Closure 
Checking,  is  to  embed  a  checking  step  in  the  search  process. 
This  checking  step  immediately  determines  whether  a  swarm 
is  closed  after  the  subtree  under  the  swarm  is  traversed,  and 
takes  little  extra  time  (actually,  0(1)  additional  time  for 
each  swarm  in  the  search  space).  Thus,  closed  swarms  are 
discovered  on-the-fly  and  no  extra  post-processing  step  is 
needed. 

In  the  following  subsections,  we  present  the  details  of  our 
ObjectGrowth  algorithm.  The  proofs  of  lemmas  and  theo¬ 
rems  are  given  in  Appendix  B. 

4.1  The  ObjectGrowth  Method 

The  ObjectGrowth  method  is  a  depth-first-search  (DFS) 
framework  based  on  the  objectset  search  space  (i.e.,  the  col¬ 
lection  of  all  subsets  of  Odb).  First,  we  introduce  the  defini¬ 
tion  of  maximal  timeset.  Intuitively,  for  an  objectset  O,  the 
maximal  timeset  Tmax(O)  is  the  one  such  that  (O,  Tmax  m 
is  a  time-closed  swarm.  For  an  objectset  O,  the  maximal 
timeset  Tmax(0)  is  well-defined,  because  Lemma  1  shows 
the  uniqueness  of  Tmax(0). 

Definition  4.1.  (Maximal  Timeset)  Timeset  T  =  { tj } 
is  a  maximal  timeset  of  objectset  O  =  {oq , oq , . . . ,  Oim  }  if: 

(1 )  Ctj  (oq )  n Ctj  ( oi2 )  n  •  •  •  n  Ctj  (oim )  ^  0,  vt,  e  T; 

( 2)  $tx  G  Tdb  \  T,  s.t.  Ctx  (oq )  C  •  •  •  fl  Ctx  ( o;m )  A  0.  We 
use  Tmax(0)  to  denote  the  maximal  timeset  of  objectset  O. 

In  the  running  example,  for  O  =  {01,02},  Tmax(0)  = 
{f  1 ,^2,^4}  is  the  maximal  timeset  of  O. 

The  objectset  space  is  visited  in  a  DFS  order.  When  visit¬ 
ing  each  objectset  O,  we  compute  its  maximal  timeset.  And 
three  rules  are  further  used  to  prune  redundant  search  and 
detect  the  closed  swarms  on-the-fly. 

4.1.1  Apriori  Pruning  Rule 


The  following  lemma  is  from  the  definition  of  Tmax- 

Lemma  2.  If  O  CO' ,  then  Tmax(0')  C  Tmax(0). 

This  lemma  is  intuitive.  When  objectset  grows  bigger,  the 
maximal  timeset  will  shrink  or  at  most  keep  the  same.  This 
further  gives  the  following  pruning  rule. 

Rule  1.  (Apriori  Pruning)  For  an  objectset  O,  if  \Tmax(0)\ 
<  mint,  then  there  is  no  strict  superset  O'  of  O  (O'  A  O) 
s.t.  (O' ,  Tmax  (O'))  is  a  (closed)  swarm. 

In  Figure  5,  the  nodes  with  objectset  O  =  {01, 03}  and  its 
subtree  are  pruned  by  Apriori  Pruning  ,  because  Tmax  (O)  < 
mint,  and  all  objectsets  in  the  subtree  are  strict  supersets 
of  O.  Similarly,  for  the  objectsets  {02,03},  {03,04}  and 
{01,02,03},  the  nodes  with  these  objectsets  and  their  sub¬ 
trees  are  also  pruned  by  Apriori  Pruning  . 

4. 1.2  Backward  Pruning  Rule 

By  using  Apriori  Pruning,  we  prune  objectsets  O  with 
Tmax (O)  <  mint-  However,  the  pruned  search  space  could 
still  be  extremely  huge  as  shown  in  the  following  example. 

Suppose  there  are  100  objects  which  are  all  in  the  same 
cluster  for  the  whole  time  span.  Given  min0  =  1  and  mint  = 

1,  we  can  hardly  prune  any  node  using  Apriori  Pruning  . 

The  number  of  objectsets  we  need  to  visit  is  2100!  But  it  is 
easy  to  see  that  there  is  only  one  closed  swarm:  (Odb, Tdb). 

We  can  get  this  closed  swarm  when  we  visit  the  objectset 
O  =  Odb  in  the  DFS  after  100  iterations.  After  that,  we 
waste  a  lot  of  time  searching  objectsets  which  can  never 
produce  any  closed  swarms. 

Since  our  goal  is  to  mine  only  closed  swarms,  we  can  de¬ 
velop  another  stronger  pruning  rule  to  prune  the  subtrees 
which  cannot  produce  closed  swarms.  Let  us  take  some  ob¬ 
servations  in  the  running  example  first. 

In  Figure  5,  for  the  node  with  objectset  O  =  {01, 04},  we 
can  insert  02  into  O  and  form  a  superset  O'  =  {01,02,04}. 

O'  has  been  visited  and  expanded  before  visiting  O.  And  we 
can  see  that  Tmax(0)  =  Tmax(0')  =  {ti,t2,t4.}.  This  indi¬ 
cates  that  for  any  timestamp  when  01  and  04  are  together, 

02  will  also  be  in  the  same  cluster  as  them.  So  for  any  super¬ 
set  of  {01, 04}  without  02,  it  can  never  form  a  closed  swarm. 
Meanwhile,  02  will  not  be  in  O’s  subtree  in  the  depth-first 
search  order.  Thus,  the  node  with  {01,04}  and  its  subtree 
can  be  pruned. 

To  formalize  Backward  Pruning  rule,  we  first  state  the 
following  lemma. 

Lemma  3.  Consider  an  objectset  O  =  {oq,  oq, . . . ,  otm} 

(ii  <  12  <  ■■■  <  im),  if  there  exists  an  objectset  O'  such  that 
O'  is  generated  by  adding  an  additional  object  op  (op  ^  O 
and  i'  <  im)  into  O  such  that  Ct  (O)  C  Ctj  (op),  Mtj  G 
Tmax  (O) ,  then  for  any  objectset  O'1  satisfying  O  C  O"  but 
O'  ^  O" ,  (O" ,Tmax(0"))  is  not  a  closed  swarm. 

Note  that  when  overlapping  is  not  allowed  in  the  clus¬ 
ters,  the  condition  Ctj(0)  C  Ctj(op),  Vt,  G  Tmax(0)  sim¬ 
ply  reduces  to  Tmax(0')  =  Tmax(O).  Armed  with  the  above 
lemma,  we  have  the  following  pruning  rule. 

Rule  2.  (Backward  Pruning)  Consider  an  objectset  O  = 
{oq ,  oq , . . . ,  Oim  }  (ii  <  i2  <  ...  <  im),  if  there  exists  an 
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Figure  5:  ObjectGrowth  Search  Space  ( min0  =  2,  mint  =  2) 


objectset  O'  such  that  O'  is  generated  by  adding  an  addi¬ 
tional  object  0^  (oil  (f  O  and  i'  <  im)  into  O  such  that 
CtAO)  C  0^(0;/),  Vt,  G  TmaX(0),  then  O  can  be  pruned 
in  the  objectset  search  space  (stop  growing  from  O  in  the 
depth-first  search). 

Backward  Pruning  is  efficient  in  the  sense  that  it  only 
needs  to  examine  those  supersets  of  O  with  one  more  object 
rather  than  all  the  supersets.  This  rule  can  prune  a  signifi¬ 
cant  portion  of  the  search  space  for  mining  closed  swarms. 
Experimental  results  (see  Figure  8)  show  that  the  speedup 
(compared  with  the  algorithms  for  mining  all  swarms  with¬ 
out  this  rule)  is  an  exponential  factor  w.r.t.  the  dataset 
size. 

4.1.3  Forward  Closure  Checking 

To  check  whether  a  swarm  (O,  Tmax(0))  is  closed,  from 
the  definition  of  closed  swarm,  we  need  to  check  every  su¬ 
perset  O'  of  O  and  Tmax(O').  But,  actually,  according  to 
the  following  lemma,  checking  the  superset  O'  of  O  with  one 
more  object  suffices. 

Lemma  4.  Swarm  ( O ,Tmax[0 ))  is  closed  iff  for  any  su¬ 
perset  O'  ofO  with  exactly  one  more  object,  we  have  \Tmax(0')\ 
<  \Tmax{0)\. 

In  Figure  5,  the  node  with  objectset  O  =  {01,02}  is 
not  pruned  by  any  pruning  rules.  But  it  has  a  child  node 
with  objectset  {01,02,04}  having  same  maximal  timeset  as 
TjnaxiO).  Thus  ({01,02},  {ti,  t2,t4})  is  not  a  closed  swarm 
because  of  Lemma  4. 

Consider  a  superset  O'  of  objectset  O  =  {o^, . . . ,  o;m} 
s.t.  O'  \0  =  {oil}.  Rule  2  checks  the  case  that  i'  <  im. 
The  following  rule  checks  the  case  that  i!  >  im. 

Rule  3.  (Forward  Closure  Checking)  Consider  an  ob¬ 
jectset  O  =  { oil ,  0i2 , . . . ,  0im  }  (ii  <  h  <  ...  <  im),  if  there 
exists  an  objectset  O'  such  that  O'  is  generated  by  adding 
an  additional  object  op  (o^  (f  O  and  i'  >  im)  into  O,  and 
\Tmax(0')\  =  \Tmax(0)\,  then  ( 0,T )  is  not  a  closed  swarm. 

Note,  unlike  Rule  2,  Rule  3  does  not  prune  the  objectset 
O  in  the  DFS.  In  other  words,  we  cannot  stop  DFS  from  O. 
But  this  rule  is  useful  for  detecting  non-closed  swarms. 

4.1.4  The  ObjectGrowth  Algorithm 

Figure  5  shows  the  complete  ObjectGrowth  algorithm  for 
our  running  example.  We  traverse  the  search  space  in  the 
DFS  order.  When  visiting  the  node  with  O  =  {01,02,03}, 
it  fails  to  pass  the  Apriori  Pruning  condition.  So  we  stop 
growing  from  it,  trace  back  and  visit  node  O  =  {01, 02,  04}. 


O  passes  both  pruning  as  well  as  Forward  Closure  Check¬ 
ing.  By  Theorem  1  that  will  be  introduced  immediately 
afterwards,  O  and  its  maximal  timeset  T  =  {ti,t2,t4}  form 
a  closed  swarm.  So  we  can  output  ( 0,T ).  When  we  trace 
back  to  node  {01,02},  because  its  child  contains  a  closed 
swarm  with  the  same  timeset  as  {oi,02}’s  maximal  timeset, 
{01,02}  will  not  be  a  closed  swarm  by  the  Forward  Closure 
Checking.  We  continue  visiting  the  nodes  until  we  finish  the 
traversal  of  the  objectset-based  DFS  tree. 

Theorem  1.  (Identification  of  closed  swarm  in  Ob¬ 
jectGrowth)  For  a  node  with  objectset  O,  ( O ,Tmax{0 ))  is 
a  closed  swarm  if  and  only  if  it  passes  the  Apriori  Prun¬ 
ing  ,  Backward  Pruning,  Forward  Closure  Checking,  and 
\0\  >  min0. 

Theorem  1  makes  the  discovery  of  closed  swarms  well  em¬ 
bedded  in  the  search  process  so  that  closed  swarms  can  be 
reported  on-the-fly. 

Algorithm  1  presents  the  pseudo  code  of  ObjectGrowth. 
To  find  all  closed  swarms,  we  start  with  ObjectGrowth({} , 
Tdb,  0,  min0,  mint,  |0_db|,  \Tdb\,  Cdb). 

When  visiting  the  node  with  objectset  O,  we  first  check 
whether  it  can  pass  the  Apriori  Pruning  (lines  2-3).  Check¬ 
ing  the  size  of  Tmax  only  takes  0(1). 

Next,  we  check  whether  the  current  node  can  pass  the 
Backward  Pruning.  In  the  subroutine  BackwardPruning 
(lines  15-18),  we  generate  O'  by  adding  a  new  object  o 
(o  <  oiast )  into  O.  Then  we  check  whether  o  is  in  the  same 
cluster  as  other  objects  in  O.  If  so,  current  objectset  O 
cannot  pass  the  Backward  Pruning.  This  subroutine  takes 
0(\0db\  x  ITbsl)  in  the  worst  case. 

After  both  pruning,  we  will  visit  all  the  child  nodes  in 
the  DFS  order  (lines  6-12).  (i)  For  a  child  node  with  ob¬ 
jectset  O' ,  we  generate  its  maximal  timeset  in  the  subrou¬ 
tine  GenerateMaxTimeset  (lines  19-22).  When  generating 
Tmax  {O') ,  we  do  not  need  to  check  every  timestamp,  in¬ 
stead  we  only  need  to  check  the  timestamps  in  Tmax{0 ) 
because  we  know  that  Tmax{0')  C  Tmax(0)  by  Lemma  2. 
So  in  each  iteration,  this  subroutine  takes  0(\Tbb\)  time 
in  the  worst  case,  (ii)  For  Forward  Closure  Checking  we 
use  a  variable  forward_closure  to  record  whether  any  child 
node  with  objectset  O'  has  the  same  maximal  timeset  size 
as  Tmax  (O) .  This  takes  0(1)  times  in  each  iteration.  In 
sum,  as  a  node  will  have  at  most  \Odb\  direct  child  nodes, 
this  loop  (lines  7-12)  will  repeat  at  most  \Odb\  times,  and 
take  0(|0ds|  x  |Tbs|)  time  in  the  worst  case. 

Finally,  after  visiting  the  subtree  under  current  node,  if 
the  node  passes  Forward  Closure  Checking  and  |0|  >  min0, 
we  can  immediately  output  (O,  TmaxifO ))  as  a  closed  swarm 
(lines  13-14). 


Algorithm  1  ObjectGrowth(0,  Tmax,  oiast,  min0,  mint, 
\Odb\,  \Tdb\,  Cdb ) 

Input:  O:  current  objectset;  Tmax-  maximal  timeset  of  O; 
Oiast  ■  latest  object  added  into  0\  min0  and  mint :  mini¬ 
mum  threshold  parameters;  |Odb|:  number  of  objects  in 
database;  |Tds|:  number  of  timestamps  in  database;  Cdb- 
clustering  snapshots  at  each  timestamp. 

Output:  (O,  Tmax )  if  it  is  a  closed  swarm. 

Algorithm: 

1:  {Apriori  Pruning} 

2:  if  \Tmax\  <  mint  then 
3:  return; 

4:  {Backward  Pruning} 

5:  if  BackwardPruning(oiast,  O,  Tmax,  Cdb )  then 
6:  f  orwardjdosure  t—  true; 

7:  for  o  <—  oiast  +  1  to  \Odb\  do 

8:  OVOUjo}; 

9:  Tmax  <-  GenerateMaxTimeset(o,  oiast,  Tmax, 

Cdb)', 

10:  if  \Tmax\  =  \Tmax\  then 

11:  forward_closure  <—  false;  {Forward  Closure 

Checking} 

12:  ObjectGrowth(0„e^,  Tnew,  o,  mina,  mint,  |Odb|, 

|Tdb|,  Cdb)', 

13:  if  / orward-dosure  and  |0|  >  min0  then 

14:  output  pair  ( O ,  T)  as  a  closed  swarm; 

Subroutine:  BackwardPruning(o;ast,  O,  Tmax , 
Cdb) 

15:  for  Vo  </  O  and  o  <  oiast  do 

16:  if  Ct(o)  C  Ct{0)yt  £  Tmax  then 

17:  return  false; 

18:  return  true; 


19 

20 
21 
22 


Subroutine:  GenerateMaxTimeset(o,  oiast , 
Cdb) 


for  Vt  £  Tmax  do 

if  Ct(o)  n  Ct(oiast)  ^  0  then 
T'  T'  i  j  t- 

^  max  '  ^  max  v  i 


return  T' 


Tmax , 


Therefore,  it  takes  0(\0db\  x  |Tdb|)  for  each  iteration  in 
the  depth-first-search.  The  memory  usage  for  ObjectGrowth 
is  0(|Tdb|  x  |Odb|)  in  worst  case. 

5.  EXPERIMENT 

A  comprehensive  performance  study  has  been  conducted 
on  both  real  and  synthetic  datasets.  All  the  algorithms  were 
implemented  in  C+- (-,  and  all  the  experiments  are  carried 
out  on  a  2.8  GHz  Intel  Core  2  Duo  system  with  4GB  mem¬ 
ory.  The  system  ran  MAC  OS  X  with  version  10.5.5  and  gcc 
4.0.1. 

The  implementation  of  swarm  mining  is  also  integrated  in 
our  demonstration  system  [18].  The  demo  system  is  pub¬ 
lic  online1.  It  is  tested  on  a  set  of  real  animal  data  sets 
from  MoveBank.org2.  The  data  and  results  are  visualized 
in  Google  Map3  and  Google  Earth4. 

4http:/ /dm.cs. uiuc.edu/movemine/ 

2http:/ /www. movebank.org 

3  http://maps.google.com 

4http:/ /earth. google. com 


5.1  Effectiveness 


Figure  6:  Raw  buffalo  data. 


The  effectiveness  of  swarm  pattern  can  be  demonstrated 
through  our  online  demo  system.  Here,  we  use  one  dataset  as 
an  example  to  show  the  effectiveness.  This  data  set  contains 
165  buffalo  with  tracking  time  from  Year  2000  to  Year  2006. 
The  original  data  has  26610  reported  locations.  Figure  6 
shows  the  raw  data  plotted  in  Google  Map. 

For  each  buffalo,  the  locations  are  reported  about  every  3 
or  4  days.  We  first  use  linear  interpolation  to  fill  in  the  miss¬ 
ing  data  with  time  gap  as  one  “day” .  Note  that  the  first/last 
tracking  days  for  each  buffalo  could  be  different.  The  buffalo 
movement  with  longest  tracking  time  contains  2023  days  and 
the  one  with  shortest  tracking  time  contains  only  1  day.  On 
average,  each  buffalo  contains  901  days.  We  do  not  inter¬ 
polate  the  data  to  enforce  the  same  first/last  tracking  day. 
Instead,  we  require  the  objects  that  form  a  swarm  should 
be  together  for  at  least  mint  relative  timestamps  over  their 
overlapping  tracking  timestamps.  For  example,  by  setting 
mint  =  0.5,  oi  and  02  form  a  swarm  if  they  are  close  for  at 
least  half  of  their  overlapping  tracking  timestamps.  Then, 
DBSCAN  [7]  with  parameter  MinPts  =  5  and  Eps  =  0.001 
is  applied  to  generate  clusters  at  each  timestamp  (i.e.,  Cdb). 
Note  that,  regarding  to  users’  specific  requirements,  different 
clustering  methods  and  parameter  settings  can  be  applied 
to  pre-process  the  raw  data. 

By  setting  min0  =  2  and  mint  =  0.5  (i.e.,  half  of  the 
overlapping  time  span),  we  can  find  66  closed  swarms.  Fig¬ 
ure  7(a)  shows  one  swarm.  Each  color  represents  the  raw 
trajectory  of  a  buffalo.  This  swarm  contains  5  buffalo.  And 
the  timestamps  that  these  buffalo  are  in  the  same  cluster 
are  non-consecutive.  Looking  at  the  raw  trajectory  data 
in  Figure  6,  people  can  hardly  detect  interesting  patterns 
manually.  The  discovery  of  the  swarms  provides  useful  in¬ 
formation  for  biologists  to  further  examine  the  relationship 
and  habits  of  these  buffalo. 

For  comparison,  we  test  convoy  pattern  mining  on  the 
same  data  set.  Note  that  there  are  two  parameters  in  con¬ 
voy  definition,  m  (number  of  objects)  and  k  (threshold  of 
consecutive  timestamps).  So  m  actually  equals  to  min0  and 
k  is  the  same  as  mint-  (For  the  details  of  convoy  defini¬ 
tion  and  algorithm,  please  refer  to  Section  C.l.)  We  first 
use  the  same  parameters  (i.e.,  min0  =  2  and  mint  =  0.5) 
to  mine  convoys.  However,  no  convoy  is  discovered.  This 
is  because  there  is  no  group  of  buffalo  that  move  together 
for  consecutively  half  of  the  whole  time  span.  By  lowering 
the  parameter  mint  from  0.5  to  0.2,  there  is  one  convoy 


(a)  One  of  the  seven  swarms  discovered  with  min0  = 
2  and  mint  =  0.5 


(b)  One  convoy  discovered  with  min0  =  m  =  2  and 
mint  —  k  =  0.2 


Figure  7:  Effectiveness  comparison  between  swarm 
and  convoy. 

discovered  as  shown  in  Figure  7(b).  But  this  convoy,  con¬ 
taining  2  buffalo,  is  just  a  subset  of  one  swarm  pattern.  The 
rigid  definition  of  convoy  makes  it  not  practical  to  find  po¬ 
tentially  interesting  patterns.  The  comparison  shows  that 
the  concept  of  (closed)  swarms  are  especially  meaningful  in 
revealing  relaxed  temporal  moving  object  clusters. 

5.2  Efficiency 

To  show  the  efficiency  of  our  algorithms,  we  generate 
larger  synthetic  dataset  using  Brinkhoff ’s  network-based  gen¬ 
erator  of  moving  objects5 * .  We  generate  500  objects  (\Odb  \  = 
500)  for  105  timestamps  (|Tdb|  =  10s)  using  the  generator’s 
default  map  and  parameter  setting.  There  are  5  •  107  points 
in  total.  DBSCAN  ( MinPts=  3,  Eps  =  300)  is  applied  to 
get  clusters  at  each  snapshot. 

In  the  efficiency  comparison,  we  include  a  new  algorithm 
ObjectGrowth+,  which  is  an  extension  of  ObjectGrowth  to 
handle  probablistic  data.  Due  to  space  limit,  the  details 
of  ObjectGrowth-f-  are  presented  in  Appendix  A.  Here, 
we  briefly  describe  the  idea  of  ObjectGrowth-)-.  Object- 
Growth-)-  is  designed  to  handle  an  important  issue  in  raw 
data — asynchronous  data  collection.  Specifically,  each  ob¬ 
ject  usually  reports  their  locations  at  asynchronous  times¬ 
tamps.  However,  since  we  assume  there  is  a  location  for 
each  object  at  every  timestamp,  some  interpolation  method 
should  be  used  to  fill  in  the  missing  data  first.  But  such 

5  http:/ /www. fli-oow.de/institute/iapg/personen/brinkhoff 

/generator/ 


interpolation  is  just  an  estimation  on  the  real  locations.  So 
each  point  is  associated  with  a  probability  showing  the  confi¬ 
dence  of  its  estimation.  While  ObjectGrowth  assumes  every 
point  is  certain,  ObjectGrowth-)-  is  a  more  general  version 
of  ObjectGrowth  that  can  handle  the  probabilistic  data. 

We  will  compare  our  algorithms  with  VG-Growth  [22], 
which  is  the  only  previous  work  addressing  the  non-consecutive 
timestamps  issue.  To  make  fair  comparison,  we  adapt  VG- 
Growth  to  take  the  same  input  as  ours  but  its  time  com¬ 
plexity  will  remain  the  same.  This  transformation  will  be 
descried  in  Section  C.2.  We  further  create  a  probabilistic 
database  by  randomly  samping  1%  points  and  assigning  a 
random  probability  to  these  points.  ObjectGrowth-)-  takes 
this  additional  probabilistic  database  as  input.  The  algo¬ 
rithms  are  compared  with  respect  to  two  parameters  (i.e., 
min0  and  mint)  and  the  database  size  (i.e.,  Odb  and  Tdb)- 
By  default,  |Odb|  =  500,  |Tds|  =  105,  =  0.01, 

=  0.01,  and  6  =  0.9  (for  ObjectGrowth-)-).  We  carry 
out  four  experiments  by  varying  one  variable  with  the  other 
three  fixed.  Note  that  in  the  following  experiment  part,  we 
use  min0  to  denote  the  ratio  of  min0  over  Odb  and  mint 
to  denote  the  ratio  of  mint  over  Tdb- 

Efficiency  w.r.t.  min0  and  mint.  Figure  8(a)  shows  the 
running  time  w.r.t.  min0-  It  is  obvious  that  VG-Growth 
takes  much  longer  time  than  ObjectGrowth.  VG-Growth 
cannot  even  produce  results  within  5  hours  when  min0  = 
0.018  in  Figure  8(a).  The  reason  is  that  VG-Growtli  tries 
to  find  all  the  swarms  rather  than  closed  swarms,  and  the 
number  of  swarms  is  exponentially  larger  than  that  of  closed 
swarms  as  shown  in  Figure  9(a)  and  Figure  9(b).  Besides,  we 
can  see  that  ObjectGrowth-)-  is  slower  than  ObjectGrowth 
because  Backward  pruning  rule  is  weaker  in  ObjectGrowth+ 
due  to  the  strong  constraint  posed  by  probabilistic  database. 

Efficiency  w.r.t.  \Odb\  and  \Tdb\-  Figure  8(c)  and  Fig¬ 
ure  8(d)  depict  the  running  time  when  varying  |0_db|  and 
|Tdb[  respectively.  In  both  figures,  VG-Growth  is  much 
slower  than  ObjectGrowth  and  ObjectGrowth+.  Furth- 
more,  ObjectGrowth-)-  is  usually  10  times  slower  than  Ob¬ 
jectGrowth.  Comparing  Figure  8(c)  and  Figure  8(d),  we 
can  see  that  ObjectGrowth  is  more  sensitive  to  the  change 
of  Odb-  This  is  because  its  search  space  is  enlarged  with 
larger  Odb  whereas  the  change  of  Tdb  does  not  directly 
affect  the  running  time  of  ObjectGrowth. 

In  summary,  ObjectGrowth  and  ObjectGrowth-)-  greatly 
outperforms  VG-Growth  since  the  number  of  swarms  is  ex¬ 
ponential  to  the  number  of  closed  swarms.  ObjectGrowth-f 
is  slower  than  ObjectGrowth  because  it  considers  probabilis¬ 
tic  data  and  thus  its  pruning  rule  is  weaker.  Besides,  both 
ObjectGrowth  and  ObjectGrowth+  are  more  sensitive  to 
the  size  of  Odb  rather  than  that  of  Tdb  since  the  search 
space  is  based  on  the  objectset. 

6.  CONCLUSIONS 

We  propose  the  concepts  of  swarm  and  closed  swarm. 
These  concepts  are  different  from  that  in  the  previous  work 
and  they  enable  the  discovery  of  interesting  moving  object 
clusters  with  relaxed  temporal  constraint.  A  new  method, 
ObjectGrowth,  with  two  strong  pruning  rules  and  one  clo¬ 
sure  checking,  is  proposed  to  efficiently  discover  closed  swarms. 
The  effectiveness  is  demonstrated  using  real  data  and  effi¬ 
ciency  is  tested  on  large  synthetic  data. 


(a)  Running  time  w.r.t.  min0 


(b)  Running  time  w.r.t.  mint 


Figure  8:  Running  Time  on  Synthetic  Dataset 


(a)  Number  of  (closed)  (b)  Number  of  (closed)  (c)  Number  of  (closed)  swarms  (d)  Number  of  (closed) 

swarms  w.r.t.  min0  swarms  w.r.t.  mint  w.r.t.  \Odb\  swarms  w.r.t.  |Tdb| 


Figure  9:  Number  of  (closed)  swarms  in  Synthetic  Dataset 
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APPENDIX 

A.  HANDLING  ASYNCHRONOUS  DATA 


Figure  10:  Asynchronous  raw  data 

When  mining  swarms,  we  assume  that  each  moving  ob¬ 
ject  has  a  reported  location  at  each  timestamp.  However, 
in  most  real  cases,  the  raw  data  collected  is  not  as  ideal  as 
we  expected.  First,  the  data  for  a  moving  object  could  be 
sparse.  When  tracking  animals,  it  is  quite  possible  that  we 
only  get  one  reported  point  every  several  hours  or  even  ev¬ 
ery  several  days.  When  tracking  vehicles  or  people,  there 
could  be  a  period  of  missing  data  when  people  turn  off  the 
tracking  devices  (e.g.,  GPS  or  cell  phones).  Second,  the 
sampling  timestamps  for  different  moving  objects  are  usu¬ 
ally  not  synchronized.  As  shown  in  Figure  10,  the  recorded 
times  of  objects  A  and  B  are  different.  The  recorded  time 
points  of  A  are  9:00,  11:00,  and  14:00;  and  that  of  B  are 
9:00,  12:00,  and  13:00.  The  locations  at  other  timestamps 
are  an  estimation  of  the  real  locations. 

The  raw  data  is  usually  preprocessed  using  linear  interpo¬ 
lation.  But  the  moving  objects  may  not  necessarily  follow 
the  linear  model.  Such  interpolation  could  be  wrong.  Even 
though  more  complicated  interpolation  methods  could  be 
used  to  fill  in  the  missing  data  with  higher  precision,  any 
interpolation  is  only  a  guessing  of  real  positions.  For  some 
missing  points,  we  could  have  higher  confidence  in  guess¬ 
ing  its  location  whereas  for  the  others,  the  confidence  could 
be  lower.  So  each  interpolated  point  is  associated  with  a 
probability  showing  the  confidence  of  guessing. 

To  find  real  swarms,  it  is  important  to  differentiate  be¬ 
tween  reported  locations  and  interpolated  locations.  For 
example,  if  a  swarm  ( O ,  T)  with  most  objects  in  O  having 
low  confidence  in  interpolated  positions  at  times  in  T,  those 
objects  in  O  may  not  actually  move  together  because  the  in¬ 
terpolated  points  have  high  probability  to  be  wrong.  On  the 
other  hand,  if  we  can  first  find  those  swarms  with  high  confi¬ 
dence,  we  can  use  these  swarms  to  better  adjust  the  missing 
points  and  further  iteratively  refine  the  swarms.  In  this 
section,  we  focus  on  how  to  generalize  our  ObjectGrowth 
method  to  handle  probabilistic  data.  In  Appendix  D.2,  we 
discuss  how  to  obtain  the  probability  and  leave  the  iterative 
refinement  framework  as  an  interesting  future  work. 

A.l  Closed  Swarms  with  Probability 

We  first  define  closed  swarms  in  the  context  of  probabilis¬ 
tic  data.  A  probabilistic  database  is  derived  from  original 
trajectory  data.  Pti(oj)  is  used  to  denote  the  probability  of 
Oj  at  time  point  ti.  If  there  is  a  recorded  location  of  Oj  at 
ti ,  Pu(oj)  =  1.  If  not,  some  interpolation  method  is  used  to 
guess  this  location  and  the  probability  is  calculated  based 
on  the  confidence  of  interpolation  of  Oj  at  ti. 

Given  a  probabilistic  database  {Pti  (o-,)},  we  define  the 
confidence  of  a  pair  (O,  T)  as: 

f(o,r)=j2  n 

tiGT  OjGO 


Recall  that  in  our  definition,  a  swarm  is  a  pair  (O,  T)  which 
satisfies  three  additional  constraints  as  listed  in  Section  3. 

To  accommodate  the  probabilistic  issue,  we  propose  to  sim¬ 
ply  replace  the  second  constraint,  |T|  >  mint ,  with  the  fol¬ 
lowing  generalized  minimum  timestamp  constraint : 

(2’)  f(0,T )  >  9  x  mint’.  ( 0,T )  needs  to  satisfy  the  con¬ 
fidence  threshold  6. 

It  is  easy  to  see  that  if  Pti  (oj)  =  1,  Vti  £  T  and  Moj  £  O, 
condition  (2’)  becomes  f(0,T)  =  |T|  >  mint  when  9  =  1. 
Meanwhile,  the  more  uncertainty  in  the  original  data,  the 
more  difficult  this  requirement  can  be  satisfied.  In  other 
words,  the  objects  having  more  reported  locations  at  times¬ 
tamps  in  T  are  preferred.  Note  that  the  definition  of  closed 
swarms  remains  the  same. 

A.2  The  ObjectGrowth+  method 

The  ObjectGrowth+  method  is  derived  from  the  Object- 
Growth  method  to  accommodate  the  probabilistic  data.  The 
general  philosophy  of  ObjectGrowth+  is  the  same  to  that 
of  ObjectGrowth.  We  search  on  the  objectset  space  in  the 
DFS  fashion  using  the  Apriori  and  Backward  rules  to  prune 
the  redundant  search  space  and  use  Closure  Checking  to  re¬ 
port  the  closed  swarms  on-the-fly.  Such  rules  are  modified 
accordingly.  In  this  section,  we  will  formulate  the  lemmas 
and  informally  describe  the  rules.  The  proofs  are  deferred 
in  Appendix  B. 

Lemma  5.  IfO  C  O',  then  f(0' ,Tmax(0'))  <  f(0,Tmax(0)). 

Apriori  pruning  rule  can  be  naturally  derived  from  Lemma  5. 
That  is,  when  visiting  node  (O,  Tmax(0)),  if  f(0,  Tmax(0))  < 

9  x  mint,  we  can  stop  searching  deeper  from  this  node.  Be¬ 
cause  all  the  objectsets  in  its  children  nodes  are  supersets 
of  O  and  thus  all  the  children  nodes  will  also  violate  this 
requirement  for  closed  swarms. 

Lemma  6.  Consider  an  objectset  O  =  {o;15  o;2, . . . ,  Oim} 

(ii  <  ii  <  ...  <  im),  if  there  exists  an  object  O'  generated 
by  adding  an  additional  object  o p  (op  O  and  i'  <  im) 
into  O  such  that  CtjiO)  C  Ctj(op)  and  Ptj(op)  =  1,  Mtj  £ 
Tmax(O),  then  for  any  objectset  O"  satisfying  O  C  O"  but 
O'  O" ,  iO" ,  Tmax  (O" ) )  is  not  a  closed  swarm. 

Comparing  Lemma  6  with  Lemma  3,  we  can  see  that  there 
is  an  additional  constraint  in  Lemma  6.  Besides  check¬ 
ing  whether  CtAO )  C  CtAop ),  we  need  to  further  check 
whether  Pt  ■  (op)  =  1,  Vtj  £  TmaxiO).  As  the  constraints  are 
harder  to  be  satisfied,  the  Backward  pruning  rule  becomes 
weaker  in  ObjectGrowth+.  For  example,  let  O  =  {02}  and 
O'  =  {01 ,02}.  We  cannot  prune  node  with  (O,  Tmax  (O))  be¬ 
cause  there  could  exist  a  child  node  of  O,  O"  =  {02, 03},  such 
that  f(0",Tmax(0"))  >  9  x  mint  whereas  the  child  node  of 
O',  O'"  =  {01,02,03},  does  not  satisfy  f(0'",Tmax(0"'))  > 

9  x  mint-  This  is  the  major  reason  why  ObjectGrowth+ 
takes  longer  time  to  discover  swarms  than  ObjectGrowth  as 
shown  in  experiments  in  Section  5. 

Lemma  7.  Swarm  ( 0,Tmax(0 ))  is  closed  iff  for  any  su¬ 
perset  O'  ofO  with  exactly  one  more  object,  we  have  |T’maa:(0,)l  < 
|  Tmax(0)\  or  f(0',TmaX(0'))  <9  x  mint . 

When  visiting  node  with  O,  different  from  the  forward  clo¬ 
sure  checking  rule  in  ObjectGrowth,  we  need  to  check  both 
“forward”  and  “backward”  supersets  of  O.  If  O  =  {oq, 


Oi2,. . .  we  need  to  test  each  superset  O'  by  adding 

<V  i  O.  Once  \Tmax(0')\  <  \Tmax(0)\ or  f(0',Tmax(0'))  < 

6  x  mint ,  current  node  with  O  will  pass  the  closure  checking. 

Finally,  we  output  the  set  of  nodes  ( 0,Tmax(0 ))  passing 
all  the  conditions  and  \0\  >  min0. 

B.  PROOFS  OF  LEMMAS  AND  THEOREMS 
B.l  Proof  of  Lemma  1 

The  existence  is  trivial,  since  we  can  always  let  T'  =  T. 
We  only  need  to  prove  the  uniqueness.  For  the  purpose 
of  contradiction,  suppose  there  are  two  time-closed  swarms 
(O, Ti)  and  (0,T2),  Ti  ^  T2  s.t.  T  C  Ti  and  T  C  T2. 
Letting  T"  =  Ti  U  T2,  from  the  definition,  ( 0,T ")  is  also 
a  swarm.  Because  Ti  ^  T2,  we  have  Ti  C  T"  and  T2  C 
T" .  However,  this  contradicts  with  the  fact  that  (0,Ti) 
and  (O,  T2)  are  time-closed.  So  there  is  a  unique  time-closed 
swarm  (0,T')  s.t.  T  C  T'.  □ 


B.6  Proof  of  Lemma  5 

By  Lemma  2,  we  have  Tmax(0')  C  Tmax(0).  Therefore, 


f(0',Tmax(0'))  = 


< 


< 


e  n 

ti^Tmax  (O')  Oj  GO' 

e  n 


tieTmax(o)  Oj-eo' 

e  n 

ti£Tmax(0)  ojeO 


f(0,  Tmax (O)), 


where  we  use  the  fact  that  for  any  ti  and  o.,  ,  0  <  Pti  (oj)  <  1. 

□ 


B.7  Proof  of  Lemma  6 

Note  that  since  0  <  Pt^Oj)  <  1  for  any  ti  and  Oj,  the 
conditions  (7^.(0)  C  Ct^cv)  and  Pt3(cv)  =  1  imply: 


B.2  Proof  of  Lemma  2 

Vt  £  Tmax  (O' ) ,  by  the  definition  of  maximal  timeset,  we 
have  Ct(o'i)r\Ct(o'j)  ^  0,  Vo',  o'  £  O'.  This  implies  Ct(oi)C\ 
Ct(oj)  ^  0,  Voi,Oj  £  O  since  O  C  O'.  Therefore,  t  £ 

TmaxiO).  □ 

B.3  Proof  of  Lemma  3 

We  show  that  (O"  U  { Oi’},Tmax{0 "))  is  a  swarm,  lienee 
(O'' ,  Tmax  (O''))  cannot  be  a  closed  swarm.  By  construc¬ 
tion,  it  satisfies  conditions  (1)  and  (2)  in  the  definition  of  a 
swarm  trivially,  so  we  only  need  to  check  condition  (3).  To 
see  that  (3)  holds  for  (O"  U  {o^ },  Tmax(0"))  as  well,  note 
that  Ctj(0)  C  Ctj(oii),  Vtj  £  Tmax(0).  Since  Tmax(0")  C 
Tmax(0),  we  have 

ctj(o")  c  ctj(0)  c  Ct.Mytj  £  Tmaa;(o"), 

hence, 

Ctj  (O")  fl  Ctj  (0i/)  =  Ctj(0")  ^  0,Vtj  £  Tmax(0"). 

Therefore,  (O''  U  {o^ },  Tmax(0"))  is  a  swarm  and  we  are 
done.  □ 


f(0',Tmax(0'))  =  f(0,Tmax(0)). 

We  demonstrate  that  (O"  U  {cv},  Tmax(0"))  is  a  swarm, 
hence  (O" ,Tmax(0"))  cannot  be  a  closed  swarm.  First  note 
that  (O"  U  {cv}, Tmax(0”))  satisfies  condition  (1)  trivially. 
For  (2’),  observe  that  Ptj(oi’)  =  1,  Vt,  £  Tmax(0")  because 
Tmax(0")  c  Tmax  (O) ,  therefore 

f(0"U{0i,},Tmax(0")) 

E  n 

tieTmax(0")  oieo"u{oi,} 

e  n  Pute) 

Oj  GO" 

=  f(0",Tmax(0”)). 

Finally,  since  Tmax(0")  C  Tmax(0),  we  have 

Ctj.(0")  C  0t,(0)  C  CtjMVtj  £  Tmax(0"), 

hence, 

Ct,(0")  nct,(cv)  =  Ctj(0")  ±  0,vti  £  Tmo40"). 


B.4  Proof  of  Lemma  4 

The  proof  of  (=>)  is  trivial  by  the  definition  of  closed 
swarm.  For  (<=),  first  note  that  by  using  Tmax(0),  it  auto¬ 
matically  satisfies  the  time-closed  condition.  Second,  VO'', 
s.t.  O  C  O",  choose  o"  £  O"  \  O.  Consider  the  set  0+  = 
O  U  {o"}.  Since  Tmax(0+)  C  Tmax(0)  and  \Tmax(0+)\  < 
\Tmax(0)\  by  assumption,  we  get  T-max  (0+)  c  Tmax  (O). 
By  Lemma  2,  Tmax(0")  C  Tmax(0).  Thus,  (O" ,Tmax(0)) 
is  not  a  swarm.  Hence  (0,Tmax(0))  is  object-closed  and 
therefore  closed.  □ 


Therefore  condition  (3)  also  holds.  □ 

B.8  Proof  of  Lemma  7 

The  proof  of  (=£>)  is  trivial  by  definition.  For  (<=),  first 
note  that  by  using  Tmax(0),  it  automatically  satisfies  the 
time-closed  condition.  Second,  VO”,  s.t.  O  C  O" ,  choose 
o"  £  0"\0.  Consider  the  set  0+  =  OU{o"}.  One  one  hand, 

if  \Tmax(0+)\  <  \Tmax(0)\  We  get  Tmax(0+)  C  Tmax(0). 

Therefore, 

Tmax(0")  C  Tmax(0+)  C  Tmax(0) 


B.5  Proof  of  Theorem  1 

Clearly,  every  closed  swarm  is  derived  by  Apriori  Prun¬ 
ing  ,  Backward  Pruning,  Forward  Closure  Checking,  and 
\0\  >  min0.  Now  suppose  that  a  node  (0,Tmax(0))  passes 
all  the  conditions.  First,  Apriori  Pruning  ensures  that 
\Tmax(0)\  >  mint-  Also,  we  explicitly  require  |0|  >  min0, 
so  (O,  Tmax (O))  satisfies  the  swarm  requirement.  Next,  by 
definition,  (O,  Tmax(0))  is  guaranteed  to  be  time-closed.  Fi¬ 
nally,  if  (O,  Tmax(0))  were  not  object-closed,  it  fails  the  con¬ 
ditions  of  Backward  Pruning  or  Forward  Closure  Checking 
(Lemma  4).  Therefore,  (0,Tmax(0))  is  a  closed  swarm.  □ 


by  Lemma  2.  One  the  other  hand,  if  Tmax(0")  =  Tmax(0) 
and  f  (O' ,  Tmax  (O'))  <  9  x  mint,  then  we  get 

f(0",Tmax(0))  =  f  (O" ,  Tmax  (O")) 

<  f  (O' ,  Tmax  (O')) 

<  6  x  mint 

by  Lemma  5.  So  we  conclude  that  (O" ,Tmax(0))  is  not  a 
swarm  and  (0,Tmax(0))  is  object-closed.  □ 

C.  RELATED  WORKS  FOR  COMPARISON 


C.l  Moving  cluster,  flock  and  convoy 

Kalnis  et  a  1.  propose  the  notion  of  moving  cluster  [14], 
which  is  a  sequence  of  spatial  clusters  appearing  during  con¬ 
secutive  timestamps,  such  that  the  portion  of  common  ob¬ 
jects  in  any  two  consecutive  clusters  is  not  below  a  given 
threshold  parameter  9,  i.e.,  where  ct  denotes 

a  cluster  at  time  t.  A  flock  [10,  9,  2,  4]  is  a  group  of  at 
least  m  objects  that  move  together  within  a  circular  region 
of  radius  r  during  a  specific  time  interval  of  at  least  k  times¬ 
tamps.  While  flock  could  be  sensitive  to  user-specified  disc 
size  and  the  circular  shape  might  not  always  be  appropriate, 
Jeung  et  a 1.  further  propose  the  concept  of  convoy  [13,  12]. 
A  convoy  is  a  group  of  objects,  containing  at  least  to  objects 
and  these  objects  are  density-connected  with  respect  to  dis¬ 
tance  e  during  k  consecutive  time  points.  Comparing  with 
moving  cluster  and  flock,  convoy  is  a  more  flexible  defini¬ 
tion  to  find  moving  object  clusters  but  it  is  still  confined  to 
strong  constraint  on  consecutive  time.  So  we  compare  our 
effectiveness  with  convoy  in  Section  5. 

C.2  Group  pattern 

Wang  et  a 1.  [22]  further  propose  to  mine  group  patterns. 
The  definition  of  group  pattern  is  similar  to  that  of  the 
swarm,  which  also  addresses  time  relaxation  issue.  Group 
pattern  is  a  set  of  moving  objects  that  stay  within  a  disc 
with  maxjdis  radius  for  minjwei  period  and  each  consec¬ 
utive  time  segment  is  no  less  than  min-dur.  [22]  devel¬ 
ops  VG-Growth  method  whose  general  idea  is  depth-first 
search  based  on  conditional  VG-grapli.  Although  the  idea 
of  group  pattern  is  well-motivated,  the  problem  is  not  well 
defined.  First,  the  “closeness”  of  moving  objects  is  confined 
to  be  within  a  maxjdis  disk.  A  fixed  maxjdis  for  all  group 
patterns  could  not  produce  natural  cluster  shapes.  Second, 
since  it  does  not  consider  the  closure  property  of  group  pat¬ 
terns,  it  will  produce  an  exponential  number  of  redundant 
patterns  that  severely  hinders  efficiency.  All  these  problems 
can  be  solved  in  our  work  by  using  density-based  clustering 
to  define  “closeness”  flexibly  and  introducing  closed  swarm 
definition. 

To  make  fair  comparison  on  efficiency  in  Section  5,  we 
adapt  VG-Growth  to  accommodate  clusters  as  input.  We 
set  minjlur  =  1  and  minjwei  =  mint .  Since  the  search 
space  of  VG-Growth  is  the  same  as  our  methods  to  produce 
swarms,  it  is  equivalent  to  compare  the  latter  ones  with  our 
proposed  closed  swarm  methods.  To  produce  swarms,  we 
can  simply  omit  the  Backward  Pruning  rule  and  Forward 
Closure  Checking  in  ObjectGrowth.  So  VG-Growth  is  es¬ 
sentially  searching  on  objectset  and  using  Apriori  pruning 
rule  only. 

C.3  Closed  frequent  pattern  mining 

In  overview  of  our  algorithms  in  Section  4,  we  mention 
there  are  three  major  pruning  techniques  for  closed  itemset 
mining  in  previous  works  [19,  25,  21].  (1)  Item  merging: 
Let  A'  be  a  frequent  itemset.  If  every  transaction  contain¬ 
ing  itemset  X  also  contains  itemset  Y  but  not  any  proper 
superset  of  Y,  then  X  U  Y  forms  a  frequent  closed  itemset 
and  there  is  no  need  to  search  any  itemset  containing  X  but 
no  Y.  (2)  Sub-itemset  pruning :  Let  X  be  the  frequent  item- 
set  currently  under  consideration.  If  X  is  a  proper  subset 
of  an  already  found  frequent  closed  itemset  Y  and  support 
of  X  is  equal  to  that  of  Y,  then  X  and  all  of  .Y’s  descen¬ 
dants  in  the  set  enumeration  tree  cannot  be  frequent  closed 


itemsets  and  thus  can  be  pruned.  (3)  Item  skipping:  If  a 
local  frequent  item  has  the  same  support  in  several  header 
tables  at  different  levels,  one  can  safely  prune  it  from  the 
header  tables  at  higher  levels.  It  is  easy  to  see  that  all  these 
three  pruning  strategies  are  all  covered  by  our  one  simple 
Backward  Pruning  rule.  Thus,  we  consider  our  Backward 
Pruning  rule  is  a  novel  pruning  strategy  that  is  able  to  de¬ 
tect  several  redundant  cases  at  the  same  time. 

D.  PRE-PROCESSING 
D.l  Obtaining  clusters 

The  clustering  method  is  not  fixed  in  our  framework.  One 
can  cluster  cars  along  highways  using  a  density-based  method, 
or  cluster  birds  in  3-D  space  using  the  fc-means  algorithm. 
Clustering  methods  that  generate  overlapping  clusters  are 
also  applicable,  such  as  EM  algorithm  or  using  e-disk  to  de¬ 
fine  a  cluster.  Also,  clustering  parameters  are  decided  by 
users’  requirements  or  can  be  indirectly  controlled  by  users’ 
expectation  on  the  number  of  clusters  at  each  timestamp. 

Usually,  most  of  clustering  methods  can  be  done  in  poly¬ 
nomial  time.  In  our  experiment,  we  used  DBSCAN  [7], 
which  takes  0(\0db\  x  log|ODs|  x  \Tdb\)  in  total  to  do 
clustering  at  every  timestamp.  Comparing  with  exponen¬ 
tial  search  space  of  swarms,  such  polynomial  time  in  pre¬ 
processing  step  is  acceptable.  To  speed  it  up,  there  are  also 
many  incremental  clustering  methods  for  moving  object.  In¬ 
stead  computing  clusters  from  scratch  at  each  timestamp, 
clusters  can  be  incrementally  updated  from  last  timestamp. 

D.2  Estimation  of  missing  points 

For  a  moving  object,  there  could  be  many  interpolation 
methods  to  fill  in  the  missing  points  based  on  its  own  move¬ 
ment  history.  Among  all,  linear  interpolation  is  the  most 
commonly  used  method.  Here,  we  propose  a  method  based 
on  linear  interpolation  to  obtain  the  probability  on  the  esti¬ 
mation  of  missing  points. 

For  a  missing  point,  we  only  consider  its  immediate  last 
recorded  point  and  immediate  next  recorded  point.  Given 
two  reported  locations  ( xo,yo )  at  time  to  and  (xi,yi)  at 
time  fi,  we  need  to  fill  in  the  points  for  any  timestamp 
between  to  and  t\.  We  assume  the  moving  object  follows 
the  linear  model.  The  intuition  to  obtain  the  probability  is 
that  for  the  timestamp  that  is  closer  to  to  or  fi,  the  linearly 
interpolated  points  have  higher  probabilities  to  be  correct. 
Therefore,  the  probability  at  t(to  <  t  <  tf)  can  be  calcu¬ 
lated  as  e-Axmlnfi-fo4i-t}j  where  A  >  0  is  used  to  control 
the  degree  of  sharpness  in  the  probability  function.  In  the 
extreme  cases,  when  t  =  to  or  t  =  fi,  the  probability  equals 
to  1.0. 

After  we  obtain  an  initial  estimation  of  the  missing  points, 
ObjectGrowth+  method  can  be  applied  to  mine  swarms.  In 
turn,  discovered  swarms  can  be  further  used  to  adjust  the 
missing  points.  Since  the  initial  linear  interpolation  is  a 
rough  estimation  of  real  locations  based  on  one’s  own  move¬ 
ment  history,  swarms  can  help  better  estimate  the  missing 
points  from  other  similar  movements.  The  general  idea  is 
that,  if  we  find  an  object  is  in  a  swarm  with  objectset  O 
and  the  position  of  o  at  timestamp  t  is  estimated,  then  this 
position  can  be  adjusted  towards  those  reported  locations 
of  O  at  t  and  the  probability  is  updated  accordingly.  We 
consider  such  iterative  framework  to  refine  the  swarms  and 
missing  points  as  a  promising  future  work.  1 


E.  EXTENSION 

E.l  Search  based  on  timeset 

As  ObjectGrowth  is  based  on  objectset  search  space,  sim¬ 
ilarly,  the  search  could  be  conducted  on  timeset  space.  Apri- 
ori  Pruning  and  Backward  Pruning  in  ObjectGrowth  can 
be  easily  adapted  to  prune  the  unnecessary  timeset  search 
space  and  Forward  Closure  Checking  can  also  used  to  dis¬ 
cover  closed  swarms  during  the  search  space. 

The  major  difference  between  two  search  directions  is  that 
if  we  fix  one  timeset,  there  could  be  more  than  one  maximal 
corresponding  objectset.  For  example,  in  the  running  exam¬ 
ple,  if  we  fix  the  timeset  as  {fi},  there  are  two  maximal  cor¬ 
responding  objectsets:  {03}  and  {01, 02,  04}.  However,  when 
the  objectset  is  fixed,  there  is  only  one  maximal  correspond¬ 
ing  timeset.  So  on  the  node  of  DFS  tree  based  on  timeset, 
we  need  to  maintain  a  timeset  T  and  set  of  corresponding 
objectsets.  Accordingly,  the  rules  need  to  be  modified.  For 
Apriori  Pruning  rule,  once  there  is  one  corresponding  ob¬ 
jectset  has  less  than  min0  objects,  it  can  be  deleted.  And 
for  a  node  with  timeset  T,  there  is  no  more  remaining  corre¬ 
sponding  objectset,  it  can  be  pruned.  For  Backward  Pruning 
rule,  we  add  one  more  timestamp  fj /  (i'  <  im  and  tp  ^  T) 
in  T  —  {ti1,ti2, . . .  If  every  maximal  corresponding 

objectset  remains  unchanged,  this  node  can  be  pruned.  Sim¬ 
ilarly,  for  Forward  Closure  Checking,  if  we  add  Ip  ( i ’  >  im) 
into  T,  and  every  maximal  corresponding  objectset  remains 
unchanged,  this  node  is  not  closed.  Finally,  for  any  node 
passed  all  the  rules  and  |T|  >  mint,  it  is  a  closed  swarm. 

Comparing  two  different  search  methods,  ObjectGrowth 
is  suitable  for  mining  the  group  of  moving  objects  that  travel 
together  for  considerably  long  time,  whereas  search  based  on 
timeset  is  more  efficient  at  finding  the  large  group  of  objects 
moving  together.  In  most  real  applications,  we  usually  tack 
a  certain  set  of  moving  objects  over  long  time,  for  example, 
tracking  100  moving  objects  over  a  year.  Therefore,  it  is 
usually  the  case  that  we  have  |Tdb|  \Odb\-  So  search 
based  on  timeset  often  is  less  efficient  than  the  one  based  on 
timeset  because  its  search  space  is  much  larger.  Thus,  we 
introduce  ObjectGrowth  as  our  major  algorithm. 

E.2  Enforcing  gap  constraint 

In  the  case  that  minJt  is  much  smaller  than  |Tdb|,  there 
could  be  two  kinds  of  swarm  discovered  with  rather  different 
meanings.  For  example,  if  the  whole  time  span  is  365  days, 
min-t  is  set  to  be  30  (days),  a  swarm  could  be  a  group  of 
objects  that  move  together  for  only  one  month  but  keep  far 
away  for  the  rest  11  months  or  a  set  of  objects  gather  in 
each  month  during  the  whole  year. 

For  the  first  case,  we  can  specify  a  range  of  time  period 
and  then  discover  swarms.  For  the  latter,  it  requires  more 
strategies.  One  solution  could  be,  for  a  swarm  (0,T),  we 
enforce  gap  constraint  on  the  time  dimension  T.  For  ex¬ 
ample,  if  we  set  min-gap  =  7 (days)  and  suppose  there  are 
two  objects  being  together  for  14  consecutive  days,  these  10 
days  can  contribute  at  most  2  to  mint  because  we  require 
there  should  be  at  least  a  gap  with  length  7  between  any 
two  timestamps  in  T. 

To  further  embed  such  min-gap  in  our  ObjectGrowth 
method,  we  can  compute  an  upper  bound  for  Tmaa,(0)  at 
each  node  of  the  search  tree  with  objectset  O.  The  upper 
bound  can  be  computed  using  greedy  algorithm  as  shown 
Algorithm  2.  Accordingly,  the  size  of  Tmax(O)  is  no  longer 


simply  measure  by  \Tmax(0)\.  Instead,  it  should  be  mea¬ 
sured  by  its  upper  bound.  And  the  pruning  rules  are  all 
affected  accordingly. 


Algorithm  2  Calculate  upper  bound  of  Tmax  (O) 

Input:  Tmax(0)  =  {ti,f2,  •  •  •  <  h  <  •  •  •  <  tm)  and 

min-gap. 

Output:  upper  bound  of  Tmax(0). 

1:  upper  .bound  <—  1; 

2:  lastt  ^  tm  j 

3:  for  i  m  —  1  to  1  do 

4:  if  lastt  —  ti  >  min-gap  then 

5:  upper  Jbound  t—  upper -bound  +  1; 

6:  lastt  <—  tp, 

7:  Return  upper -bound- 


E.3  Sampling 

The  size  of  the  trajectory  dataset  has  two  factors  \Odb\ 
and  \Tdb\-  In  animal  movements,  \Odb\  is  usually  relatively 
small  because  it  is  expensive  to  track  animals  so  the  number 
of  animals  being  tracked  seldom  goes  up  to  hundreds.  When 
tracking  vehicles,  the  number  could  be  as  large  as  thousands. 
For  both  cases,  \Tdb\  is  usually  large.  When  \Tdb\  is  very 
large,  we  may  use  sampling  as  a  pre-processing  step  to  re¬ 
duce  the  data  size.  For  example,  if  the  location  sampling 
rate  is  every  second,  \Tdb\  will  be  86400  for  one  day.  We 
can  first  sample  one  location  in  every  minute  to  reduce  the 
size  to  1500.  This  is  based  on  the  assumption  that  one  mov¬ 
ing  object  will  not  travel  too  far  away  within  a  considerable 
short  time. 
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